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Abstract 


We present a new proposal for a trapdoor one-way function, from which we 
derive public-key encryption and digital signatures. The security of the new con- 
struction is based on the conjectured computational difficulty of lattice-reduction 
problems, providing a possible alternative to existing public-key encryption algo- 
rithms and digital signatures such as RSA and DSS. 
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1 Introduction 


The need for public-key encryption and digital signatures is spreading rapidly today as more people 
use computer networks to exchange confidential documents, buy products and access sensitive data. 
In fact, several of these tasks are impossible to achieve without the availability of good (secure and 
efficient) public-key cryptography. 

In light of the importance of public key cryptography, it is surprising that there are relatively 
few proposals of public key cryptosystems which have received any attention. Moreover, the source 
of security of these proposals almost always relies on the (apparent) computational intractability 
of problems in finite integer rings, specifically integer factorization and discrete logarithm com- 
putations. In this paper we propose a new public key encryption algorithm and digital signature 
scheme whose security relies on the computational difficulty of lattice reduction problems, in par- 
ticular the problem of finding closest vectors in a lattice to a given point (CVP). For comparison 
with existing schemes, we first quickly review some of the most famous public-key encryption and 
digital signatures proposals, with emphasis on the computational problems their security is based 
on. 


1.1. Previous proposals 


The security of the RSA cryptosystem [RSA], is related to the difficulty of integer factorization 
in the sense that discovering the secret key is as hard as factoring integers, although the actual 
cryptanalysis problem is potentially easier than factoring integers. Other methods, whose security 
relies on the difficulty of factoring integers, include Rabin’s digital signature method [Ra79] (and 
its variants — e.g., [Wi84]), the semantically-secure public-key encryption of [GM82, BG&84], and 
the existentially unforgeable signature schemes of [GMR85]. 

The security of the Diffie-Hellman public-key encryption scheme! is related to the problem of 
computing discrete logarithms (DLP) in finite fields in the sense that finding the secret key is 
as hard as computing discrete logarithms. Again, the actual cryptanalysis problem is potentially 
easier than discrete log computation. The digital signature method of El-Gamal [F185] (and its 
DSS modification [DSS]) is also no harder to break than it is to solve discrete logarithms in finite 
fields. A similar paradigm to the above discrete log based schemes, can be carried out over elliptic 
curves. In that case, the underlying computational problem is the Elliptic Logarithm problem, 
to compute logarithms in the additive group of points defined by elliptic-curves. 

The McEliece public-key encryption scheme [Mc79] is substantially different from the above 
proposals, in that its security is based on a problem from algebraic coding theory. The security 
of this scheme is based on the conjecture that decoding with a “random looking” linear code is 
as hard as decoding with a truly random linear code, and on the widely believed intractability of 
decoding with random linear codes. In terms of efficiency, encryption and decryption amount toa 
matrix-by-vector multiplication which takes time quadratic in the natural security parameter (i.e., 
the dimension of the matrix). This compares favorably to the cubic time requires in RSA and the 
other number theoretic proposals above, yet the size of the public key is larger than in the case of 
RSA (i.e, quadratic rather than linear). The best known cryptanalytic attack against the McEliece 
system takes time exponential in the dimension of the code, yet the security of the McEliece 
system has not been studied as extensively as the RSA system. No digital signature scheme based 
on algebraic coding theory has been proposed to accompany the public-key encryption scheme. 


' A straightforward modification of their earlier key-exchange protocol [DH76]. 


In addition, there are general constructions of (semantically-secure [GM82]) public-key en- 
cryption schemes based on any trapdoor function [Ya82]. Interestingly, digital signature schemes 
which are existentially-unforgeable [GMR85], can be constructed based on any one-way func- 
tion [NY89, Ro90], without need of trapdoor. Thus one may benefit from the slightly more extended 
variety of candidate one-way functions which, in addition to the above, include a candidate based 
on the conjectured intractability of decoding random linear codes [GKL] and Ajtai’s recent candi- 
date [Aj96] which is based on the worst-case complexity of approximating the shortest vector in 
a lattice. Unfortunately, these general constructions for digital signatures (i.e., of [NY89, Ro90]) 
tend to be inefficient. 


1.2 The new proposal 


In this paper we propose a new trapdoor one-way function relying on the computational difficulty 
of lattice reduction problems, in particular the problem of finding closest vectors in a lattice to a 
given point (CVP). 

Starting with this trapdoor function, we derive a public-key encryption and digital signature 
methods, which are asymptotically more efficient than RSA and its variants, in that the computa- 
tion time for encryption, decryption, signing, and verifying are all quadratic in the natural security 
parameter. The size of the public key, however, is longer than for the RSA system. Specifically, 
for security parameter k, the length of the RSA public-key is & and cost of computation time is 
O(k?), whereas for the new scheme the public key is of size O(k”) and the computation time is 
O(k*). Thus, our complexities are as in McEliece encryption scheme [Mc79]. We feel that it is 
high time to reconsider the belief that shorter (private and public) keys are preferable to faster 
encryption and decryption time (or signing and verification for signatures). In particular, space 
and communication costs (associated with keys) in Internet applications seem to be less restricted 
than envisioned for public-key cryptography applications 20 years ago. 


Our trapdoor function. The idea underling our construction is that, given any basis for a 
lattice, it is easy to generate a vector which is close to a lattice point (i.e., by taking a lattice point 
and adding to it a small error vector). However it seems hard to return from this “close-to-lattice” 
vector to the original lattice point (given an arbitrary lattice basis). Thus, the operation of adding 
a small error vector to a lattice point can be thought of as a one-way computation. 

In order to introduce a trapdoor mechanism into this one-way computation, we use the fact 
that different bases of the same lattice seems to yield a difference in the ability to find close lattice 
points to arbitrary vectors in R”. Therefore the trapdoor information may be a basis of a lattice 
which allows very good approximation of the closest lattice point problem. Thus, we use two 
different bases of the same lattice. One basis is chosen to allows computing the function but not 
inverting it, while the other basis is chosen to allow computing the inverse function by permitting 
good approximation to the closet lattice vector problem (CVP). For the sake of the introduction, 
we simply call such a basis a reduced basis. In Section 2, we define a reduced basis to be one 
with a small dual-orthogonality defect (where ‘small’ is a parameter). Below we give an informal 
description of our trapdoor one-way function which uses the above ideas. 

The parameters of the system includes the security parameter n (which is the dimension of 
the lattices that we work with) and a “threshold” parameter o which determines the size of the 
error-vectors which we add to the lattice points (say, in L,, norm). 

A particular function and its trapdoor information are specified by a pair of bases of the same 
(full rank) lattice in R”: A “non-reduced” basis B which is used to compute the function and a 


reduced basis R which serves as the trapdoor information and is used for inversion. The “reduced” 
basis is selected “uniformly” and the “non-reduced” basis is derived from it using a randomized 
uni-modular transformation. 

The input to the function is a lattice point (which is specified by an integral linear combination 
of the columns of B) and an error vector whose size is bounded by a. The value of the function on 
this input is just the vector sum of the two points. To invert the function, we use a reduced basis 
R in one of Babai’s nearest-vector approximation algorithms [Ba86] to find a lattice point which is 
at most o away from the given vector. 

The cryptanalytic problem underlying our scheme is to approximate the closest vector problem 
(CVP) in a lattice, given a “non-reduced” basis for that lattice. A related problem is the problem 
of reducing the given public basis (since one obvious attack is to reduce the given basis and then 
use the result for inverting the function). See Section 2.1 for a description of these computational 
problems in lattices. 


From trapdoor function to encryption scheme. In order to use the above trapdoor function 
for public-key encryption, we need a way to embed the message in the arguments to this function. 
There are several ways to do that, and we discuss some of them in Section 4.2. Here we only 
describe one of them, in which the message is embedded in the lattice point. 

The private and public pair of keys of a user are a pair of two bases of the same lattice of 
dimension n (the security parameter). The public basis will allow encryption whereas the private 
basis is chosen to allow decryption. To encrypt a message we first map it to a lattice point by 
taking the integer combinations “specified” by the message of the public basis vectors, and then 
add to the lattice point a “small error vector” chosen at random. To decrypt, we look for a lattice 
point which is close to the ciphertext. By using the private basis, which is a reduced basis, the 
correct decryption is obtained with high probability. We remark that our encryption algorithm is 
similar in its algorithmic nature to McEliece’s scheme [Mc79]. 


Our signature scheme. Our signature scheme is similar to the encryption scheme. Regard the 
message as a n-dimensional vector over the reals. Then, a signature of such vector, is a lattice point 
which is “close” to it (where closeness is defined by a published threshold). The private basis is 
reduced so that finding “close” points is possible. Verifying correctness amounts to checking that 
a signature is indeed a lattice point and that the message is close to the signature. 

It is important to remark at the outset, that messages which are close to each other will have 
the same signature. When applying the method in a setting where this property is desirable (e.g., 
signing analog signals which may change a little in time), this feature is of great benefit. However, 
when applying the method to a message space where such property is undesirable, we propose to 
first hash the message and only then sign it. This is good practice also in case that the scheme 
is subject to a chosen message attack, as otherwise being able to obtain different signatures of 
two messages which are close to each other when viewed as points in 7” will imply the ability to 
compute a small basis for the lattice which in turn will enable the attacker to find close vectors 
in a lattice and break the scheme. Interestingly, a family of collision-free hash functions can be 
constructed assuming that Lattice-Reduction is hard on the worst-case (see below). 


1.3. Discussion 


Our work was inspired by a remarkable result of Ajtai [Aj96] who introduced a function which 
is provably a one-way function if approximating the shortest non-zero vector (SVP) in a lattice 


is hard on the worst case. Ajtai’s work may be viewed as exhibiting a samplable distribution on 
lattices and proving that approximating the shortest non-zero vector in lattices chosen according 
to this distribution is as hard as the worst case instance of approximating the shortest non-zero 
vector in a lattice. Ajtai’s construction, however, does not provide a trapdoor function and thus 
does not provide a way of doing public-key encryption based on lattice problems. Constructing 
such a trapdoor function is the novelty and focus of our work. 

We remark that the construction of [NY89] can be applied to the one-way function of Ajtai and 
thus yield a signature scheme based on the SVP problem. However, this construction is generic 
and thus quite inefficient. In contrast, the signature scheme which we suggest based on the new 
trapdoor function is more efficient, and based on the computational difficulty of the CVP. Alas, the 
distribution over CVP instances, induced by our construction, is not known to enjoy the “hardness 
of the worst-case” property of Ajtai’s result. 

In retrospect, our encryption scheme bears much similarity to McEliece’s scheme [Mc79]. His 
scheme utilizes a pair of matrices over GF(2), which corresponds to two representations of the 
same linear code. The encryption method is probabilistic: one multiplies the public matrix by 
the message vector and adds a random noise vector to the resulting codeword. Thus in both 
McEliece and our encryption scheme, encryption amounts to a matrix-by-vector multiplication 
and the addition of a suitable random vector to the result. However, the domains in which these 
operations take place are vastly different and so is the algebra. Another difference is in the way the 
private-key is generated. In McEliece’s scheme the private-key is a random Goppa code and has 
structure essential for legitimate decoding. In our scheme the private-key can be chosen uniformly 
and thus is “structure-less” — legitimate decoding merely depends on a property of such random 
choices. In both schemes the public-key is obtained by a suitable random linear transformation of 
the private-key; however, in our scheme the choice of this transformation seems richer. In general, 
we believe that McEliece’s suggestion as well as ours deserve further investigation, especially due 
to the difference in computational complexity required from the legal sender and receiver in these 
schemes as compared with the factoring/DLP based schemes. 

What can we prove about the security of our proposal? Since complexity theory has yet to 
produce a non-linear lower bounds for even one NP-complete problems, our proposal is essentially 
based on the failure of past research efforts to come up with efficient algorithms for the relevant 
lattice approximation tasks (i.e., SVP and CVP). Using the best known algorithms for approximat- 
ing the closest vector problem we show in Section 6 that a natural attack on our trapdoor function 
takes exponential time in the dimension of the lattice. In particular, according to our estimates 
this attack should be intractable in practice for dimension 200. 

Drawing an analogy from the past, in proposing the RSA, Rivest et. al. [RSA] relied on the 
failure of past research to produce efficient factoring algorithms, but did not reduce factoring to 
the breaking of their proposal. By now, the assumption that RSA itself is hard to invert (rather 
than factoring in general) can stand on its own, as it has been subjected to much examination and 
scrutiny. The structure of our proposal (i.e the key generation process) is more complex than in 
RSA and requires stating a more complex assumption. Essentially, we need to conjecture that for 
some natural distribution on lattices and bases for these lattices, the CVP is hard. We don’t know 
if a result similar to Ajtai’s can be proved for the distribution which we propose over the CVP 
instances. (Similarity, it is not known whether such a result can be proved for RSA.) We hope that 
our suggestion will stir up further investigation into the complexity of lattice problems. 


1.4 Organization 


In Section 2 we review necessary material about lattices and lattice problems. In Section 3 we 
describe our construction of a trapdoor function and discuss various parameters and attacks. Sec- 
tion 4 describes our encryption scheme and Section 5 describes our signature scheme. In Section 6 
we describe our experimental results 


2 Lattices and Lattice Reduction Problems 


Notations and conventions. In the sequel we use the following conventions: We denote the set 
of real numbers by 72 and the set of integers by 2. We denote real numbers by small Greek letters 
(e.g., 9,p,7 etc.) and integers by one of the letters 7,7,4,1,m,n. We denote vectors by bold-face 
lowercase letters (e.g., b,c, r etc.). We use capital letters (e.g., B,C, R, etc.) to denote matrices or 
sets of vectors. 

In this paper we only care about lattices of full rank, so the definitions below only deal with 
those. 


Definition 1: Given a set of n linearly independent vectors in R”, B = {b,,---,b,}, we define 
the lattice spanned by PB as the set of all possible linear combinations of the b,’s with integral 
coefficients, namely 


BY {DT hb : k; € Z for all i} 


We call B a basis of the lattice [(B). We say that a set of vectors L C R” is a lattice if there is 
a basis B such that L = L(B). If the vector v belongs to the lattice L, then we say that v is a 
lattice-vector (or a lattice point). 


Below we briefly mention a few well-known facts about lattices. In the sequel we view a basis 
for a lattice in R” as an n X n non-singular matrix B whose columns are the basis vectors. Viewed 
this way, the lattice spanned by B is the set L(B) = {Bv: v is an integral vector}. We note that 
there are many different bases for any lattice L. In fact, if the set B = {b,,---,b,} spans some 
lattice then by taking any vector b; € B and adding to it any integral linear combination of the 
other vectors we obtain a different basis for the same lattice. 


All bases have the same determinant. ‘The first important fact about lattices is that all the 
bases of a given lattice have the same determinant. This fact follows since there is an integer matrix 
T such that BT = C and another integer matrix T~! such that CT~' = B. 


The dual lattice Let B =b,,---,b, be a basis for some lattice in R", L = L(B). Recall that 
we think of B as an n X n matrix whose columns are the b,’s. The dual lattice of L is the lattice 
which is spanned by the rows of the matrix B-'. Let us denote the rows of Bu! by b,,- . - by. 


Orthogonality Defect The notion of of the orthogonality defect of a basis which was introduced 
by Schnorr in [Sc87] plays a crucial role in the security of our schemes. 


Definition 2: Let B be areal non-singular n x n matrix. The orthogonality defect of B is defined 


as orth-defect(B) = Ltt where ||b;|| is the Euclidean norm of the 7’th column in B. 


Clearly we have orth-defect(B) = 1 if and only if the columns of B are orthogonal to one another, 
and orth-defect(B) > 1 otherwise. When comparing different bases of the same lattice in R”, we 
really only care about the product of the ||b;||’s, since det(.B) is the same for all of them (and serves 
just as a normalization factor). In Section 3.5 we demonstrate the importance of the orthogonality 
defect to the security of our schemes. In particular we show that when we use a basis P for a lattice 
L = L(B) for our trapdoor function, the work load which is associated with a brute-force attacks 
on the scheme is proportional to the orthogonality defect of the corresponding basis for the dual 
lattice. It would therefore be convenient for us to define the dual orthogonality defect for a matrix. 


Definition 3: Let B be a real non-singular n x n matrix. The dual orthogonality defect of B is 
defined as orth-defect*(B) © [], ||b;||/det(B-!) = det(B) -[], ||b;||, where ||B;|| is the Euclidean 


norm of the ?’th row in Bo. 


2.1 Hard problems in lattices 


The security of our constructions is related to the (conjectured) intractability of a few computational 
problems in lattices. 


The Shortest non-zero Vector Problem (SVP). This problem underlies the security of 
Ajtai’s construction and our collision-free hashing. In this problem we are given a basis B for 
a lattice in R” and our task is to find the non-zero vector in L(.B) whose Euclidean norm is 
minimum. There are no known polynomial-time algorithms for solving the SVP, and it is also not 
known whether the SVP is W’P-hard (although a version of it, where the distance is measured in 
L.. norm, was shown by van Emde Boas [Bo81] to be W’P-hard). There are, however, deterministic 
polynomial-time approximation algorithms for the SVP. The LLL algorithm (due to Lovasz, Lenstra 
and Lenstra [LLL]) approximates the SVP in R” up toa factor of 2”/? in the worst case. This was 
later improved by Schnorr [Sc87] to a factor of (1+ ¢)” for any ¢ > 0. 

No polynomial-time algorithm is known for approximating the SVP in R” within a polynomial 
factor in n. Indeed such an approximation has been conjectured to be infeasible to achieve. Re- 
cently, Ajtai [Aj96] described samplable distributions which form also a “hard-core distribution” 
for the SVP. Namely, any (probabilistic polynomial time) algorithm which can approximate the 
SVP problem with a polynomial approximation ratio on random instances drawn with Ajtai’s dis- 
tribution, can be transformed into a (probabilistic polynomial time) algorithm which achieves a 
polynomial approximation ratio on every instance of the SVP. 


The Closest Vector Problem (CVP). This is the “non-homogeneous” version of the SVP. 
In this problem we are given a basis B for a lattice in R” and another vector v € FR”, and our 
task is to find the vector in L(B) which is closest to v (in some norm). The CVP was shown by 
van Emde Boas [Bo81] to be VP-hard. In terms of approximation, it was shown in [Ka] that any 
algorithm which approximates the SVP to within a factor of p can be transformed into an algorithm 
which approximates the CVP to within a factor of n°/?p?. Combined with Schnorr’s algorithm, this 
yields a polynomial-time deterministic algorithm which approximates the CVP in R” to within a 
factor of (1+ ¢)” for any ¢ > 0. 

As we explain in Section 3, an attack against our trapdoor function amounts to finding an exact 
solution for some instance of CVP. 


The Smallest Basis Problem (SBP). In this problem, we are given a basis B for a lattice in 
R” and our goal is to find the “smallest” basis B’ for the same lattice. There are many variants 
of this problem, depending on the exact meaning of “smallest”. In the context of this paper, we 
care about bases with small orthogonality defect. Thus, we consider the version in which we look 
for the basis B’ of L(B) which has smallest orthogonality defect. In out public key constructions, 
finding the private-key from the public-key requires solving some distribution over SBP instances. 

For this problem too there are no known polynomial-time algorithms, and the best polynomial- 
time approximation algorithms for it are again the LLL and Schnorr’s algorithms which achieve an 
approximation ratio of 2°”) in the worst case for SBP instances in R”. 


Worst case vs. average case. The upper-bounds above on the performance of the approxima- 
tion algorithms are all worst-case bounds. However, for the security of our scheme we are more 
interested in the performance of these algorithms “on the average”. In fact, typically the LLL 
algorithm and its variants perform much better than the above upper-bounds. 

The only known theoretical result about the difficulty of “average case” lattice problems is 
Ajtai’s result which we mentioned above [Aj96]. As we explained in the Introduction, however, 
we could not directly use Ajtai’s result for our scheme. We instead propose a trapdoor function 
and provide some empirical evidence to its security, by testing the difficulty of the distribution 
of lattice problems which is defined by our scheme against some known approximation algorithms 
with various parameters. We describe these tests in Section 6. 


3 <A Candidate Trapdoor Function 


In this section we define our candidate trapdoor function and analyze a few possible attacks against 
it. We start by reviewing the definition of a collection of trapdoor functions 


Definition 4: A collection of one-way trapdoor functions consists of four (probabilistic) polynomial- 
time algorithms, GENERATE, SAMPLE, EVALUATE and INVERT 


GENERATE. The randomized algorithm GENERATE takes as input the security parameter (1”) and 
outputs a pair (f,¢) where f describes a function and ¢ is a trapdoor information. There is a 
domain D; which is associated with every function f. 


SAMPLE. The randomized algorithm SAMPLE takes as input a function description f (which is 
part of the output of GENERATE) and outputs a point z € D;. The random choices of this 
algorithm induce a probability distribution over the domain D;. 


EvALUATE. The algorithm EVALUATE takes as input a function description f and a point x € D; 
and returns the value f(a). 


Invert. The algorithm INVERT takes as input a function description f, the corresponding trapdoor 
information ¢ and a point y in the range of f, and returns a point x € Dy for which f(2) = y. 


We require that the INVERT algorithm be successful with high probability, where this proba- 
bility is taken over the random coin-tosses of GENERATE and SAMPLE, and over the coin-tosses 
of INVERT itself (if it happens to be randomized). 


A collection GENERATE, SAMPLE, EVALUATE is one-way if EVALUATE is a polynomial-time algo- 
rithm, and for any probabilistic polynomial time algorithm A, the probability that A succeeds in 
inverting f - when it is only given 1”, f, f(x) - is negligible. The probability in this case is taken 


over the coin tosses of GENERATE, SAMPLE and A itself, and is measured against the security 
parameter which was the input to both GENERATE and A. Namely, we have 


Pr[A(l", f, f(e)) € f*Lf(e)]] = negligible(n) 


where the probability is taken over the choice of f by GENERATE, the choice of « by SAMPLE and 
the internal coin-tosses of the A. (we say that a real-valued function is negligible in n, if as n gets 
larger this function becomes smaller than any polynomial in 1/7). 


3.1 Our construction 


GENERATE On input 1”, we generate two bases B and R of the same full-rank lattice in Z” anda 
positive real number a. We generate these bases so that R has a low dual-orthogonality-defect 
and B has a high dual-orthogonality-defect. We describe the generation process in details 
in Section 3.3. The bases B,R are represented by n x n matrices where the basis-vectors 
are the columns of these matrices. In the sequel we call B the “public basis” and FR the 
“private basis”. We view (B,o) as the description of a function fg, and R as the trapdoor 
information. The domain of fg, consists of pairs of vectors v,e € R”. 


SAMPLE Given (B,o), we output vectors v,e € R” as follows: 


The vector v is chosen at random from a “large enough” cube in 2”. For example, we can 
pick each entry in v uniformly at random from the range {—n?,—n? + 1,---,+n7}. ? 


The vector e is chosen at random from ?”, so that each entry in it has zero-mean and variance 
o”. For example, we can pick each entry in e as to, each with probability :. Alternatively, 
if we want e to have integral entries we can pick each entry as equal to +[o] each with 


probability o?/2 [fo]? and 0 with probability 1 — 0?/ [o]’. 


EVALUATE Given B,o,v,e, we compute c = fg,(v,e) = Bv +e. 


INVERT Given R and c, we use Babai’s Round-Off algorithm [Ba86] to invert the function. Namely, 
we represent c as a linear combination on the columns of R and then round the coefficients 
in this linear combination to the nearest integers to get a lattice point. The representation of 
this lattice point as a linear combination on the columns of B is the vector v. Once we have 
v we can compute e. More precisely, denote T es 


e—c— Bv. 


B-'R, so we compute v — T[R7'e| and 


3.2. The Inversion Algorithm 


In this section we show how o can be chosen so that the inversion algorithm is successful with high 
probability. Recall that the inversion algorithm succeeds in inverting the function on c if using the 
private basis R in Babai’s “round off” algorithm results in finding the closest lattice-point to ce. 
Below we suggest two different ways to bound the value of a, based on the L, norm and L,, norm 
of rows in R~'. Both bounds uses the following lemma. 


Lemma 3.1: Let R be the private basis used in the inversion of fg.,(v,e). Then an inversion 
error occurs if and only if [R7'e| 4 0. 


?We do not know if the size of this range has any influence on the security of the construction. The value n? is 
rather arbitrary, and was only chosen to get integers of about 16 bits for the parameters which we work with. 


Proof: Let T be the unimodular transformation matrix T = B~'R. Then the inversion algorithm 
isv = T[Ro'c| ande=c— Bv. Obviously, if v is computed correctly then so is e. Thus, let us 
examine the conditions under which this algorithm finds the correct vector v. Recall that ¢ was 
computed as c = Bv +e, so 


T([R'e| = T[R'(Bv+e)| 
= T([R'Bv+R'el = T([(BT)'Bv+R'e| = T[T'v+ Rel 


But since 7 is a unimodular matrix (and therefore, so it T~') and since v is an integral vector, then 
Tv is also an integral vector. Hence we have [T’~'v + R7'e| = T7'v + [Ro'e], and therefore 


[Ro'e] =T( Tv + [Re] )=v4+T [Rel 


Thus the inversion algorithm succeeds if and only if [R7'e| = 0. O 


We proceed to show the bounds on o. In both the theorems below we assume that each entry 
in the “error vector” e is chosen equal to +o, each with probability a half. We start by asserting 
that we can choose o so that we never get any inversion errors. 


Theorem 1: Let R be the private basis used in the inversion of fg,,, and denote the maximum 
L, norm of the rows in R7' by p. Then as long as o < 1/(2p), no inversion errors can occur. 
Proof: We first introduce a few notations. We denote d= R-'e and denote the i’th entry in d 
and e by 6; and « respectively. Also, we denote the 7’th row in R7! by #; and the 2, 7’th element 
in R7! by pi;- 

By Lemma 3.1 above, we get an inversion error only when [R7'e| #4 0, which means that 
|6;| > ; for some 2. However, since all the entries in e are equal to, we get for every 2 


~ 1 
li] = |B oel =|dopuelSo- Dd lpul Sop <5 
j j 


O 


Although Theorem | gives a sufficient condition to get the error-probability down to 0, we may 
choose to set a higher value for o in order to get better security. The next theorem asserts a 
different bound on o, which guarantee a low error probability. 


Theorem 2: Let & be the private basis used in the inversion of fg,,, and denote the maximum 
L. norm of the rows in R7! by Te Then the probability of inversion errors is bounded by 


1 
Pri i i ing R]<2n- —_—— 1 
r| inversion error using R | < 2n exp ( at) (1) 
Proof: We use the notations d,6;,¢;,f; and p;; as in the proof of Theorem 1. We first fix some 
i and evaluate Pr[|6;| > 4]. Recall that 6; = #;0e = Do; piyg. Since for all 7, |pi;| < y/V/n and 
€; = +o, each with probability : then all the random variables p;;¢; have zero mean and they are 
all limited to the interval [-%, +75]. Therefore we can use Hoeffding bound to conclude that 


1 1 1 
Pr i > ;| = Pr | 2 Piss > | < 2exp (-sz) 


Using the union bound to bound the probability that any such 7 exists completes the proof. LC] 


Remark. The last theorem implies that to get the error probability below ¢ it is sufficient to 
-1 
choose a < (> V8In(@2n/e)) . In fact, the above bound is overly pessimistic in that it only looks 


at the largest entry in R~'. A more refined bound can be obtained by considering the few largest 
entries in each row separately and applying the above argument to the rest of the entries. 

Alternatively, we can get an estimate (rather that a bound) of the error probability by using 
Equation 1 as if all the entries in each row of R~' have the same absolute value. In this case ¥ is 
the maximum Euclidean norm of the rows in R7~! so we get an estimate of the error-probability in 
terms of the Euclidean norm of the rows in R~'. This estimate is about the same as the one which 
we get by viewing each of the 6;’s as a zero-mean Gaussian random variable with variance (o||#;||)? 
(where ||f,|| is the Euclidean norm of the i’th row in R7'). 

To get a feeling for the size of the parameters involved, consider the parameters n = 140, ¢ = 
2-%°. For a certain setting of the parameters which we tested, the Euclidean norm of all the rows 
in R~* is below 1/30. Evaluating the expression above for y = 1/30 yields 


For the same parameters of R, setting ¢ = 10~* yields o < 2.7 


3.3 The GENERATE Algorithm 


In this section we discuss various aspects of the GENERATE algorithm. We described in Section 3.2 
how the value of o can be computed once we have the private basis R. Now we suggest a few 
ways to pick R and B. Recall that R, B are two bases for some lattice in 2", where R has small 
dual orthogonality defect and B has a large dual orthogonality defect. Our high-level approach for 
generating the private and public bases is to choose at random n vectors in Z” to get the private 
basis and then to mix them so as to get the public one. There are two distributions to consider in 
this process 


e A distribution on the lattices in 2” which is induced by the choice of the private basis R. 


e Once we have the private basis R, there is a distribution which is induced on the bases of 
L(R) by the process of “mixing” R to get the public basis B. 


To guide us through the choices of the various parameters, we relied on experimental results (See 
Section 6). Below we briefly discuss the various parameters which are involved in this process. 


3.3.1 Lattice dimension 


The first parameter we need to set is the dimension of the lattice (the value of n). Clearly, the 
larger n is, we expect that our schemes will be more secure. On the other hand, both the space 
needed for the key pair and the running-time of function-evaluation and function-inversion grow 
(at least) as Q(n?). 

The lattice-reduction algorithm which we used for our experiments is capable of finding a 
basis with very small orthogonality defect as long as the lattice dimension is no more than 60-80 
(depending on other parameters). Beyond this point, the quality of the bases we get from this 
lattice reduction algorithm degrades rapidly with the dimensions. In particular, we found that in 
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dimension 100, the bases we obtained had a high dial-orthogonality-defect. At the present time, 
the best “practical lattice-reduction algorithm” which we are aware of is Schnorr‘s block-reduction 
scheme (which was used to attack the Chor-Rivest cryptosystem, see [Sc95]). We speculate that 
working in dimensions about 150-200 might be good enough with respect to this algorithm. 


3.3.2 Distribution of the private basis 


After setting the dimension, we need to decide on the distribution according to which we choose 
the private basis. We considered two possible such distributions. 


1. Choosing a “random lattice”: We choose a matrix R which is uniformly distributed in 
{-l,---,+0}°%" for some integer bound /. In our experiments, the value of / had almost 
no effect on the quality of the bases which we got. Therefore we chose to work with small 
integers (e.g, between +4), since this makes some of the operations more efficient. 


2. Choosing a “rectangular lattice”: We start from the box kJ in R” (for some number &), and 
add “noise” to each of the box vectors. Namely, we first pick a matrix R’ which is uniformly 
distributed in {—J/,---,+/}"*", and then compute R — R’ +k. 


The larger the value of & is, this process generates a basis with smaller dual orthogonality 
factor, but it may also allow an attacker to obtain a basis with smaller dual orthogonality 
factor by reducing the public basis. 


3.3.3. Generating the public basis 


Once we have the private basis R, we should pick the public basis according to some distribution 
on the bases of the lattice Z(R). Since every basis of L(R) is obtained as B = RT for some 
unimodular transformation matrix 7’, then picking B when we have RF is equivalent to picking a 
“random unimodular transformation”. We tried two ways of generating these “random unimodular 
transformations”. Both methods work by multiplying many “elementary matrices”, of different 
forms. 


e One type of elementary matrices which we considered are matrices of the form 


1 * 1 
1 x 1 
1 or Sr ee ee ace 
* 1 1 
* 1 1 


where the x’s represent any integers and the blanks represent zeros. (The first form corre- 
sponds to adding to the ?’th columns an integer linear combination of the other columns and 
the second corresponds to adding an integral multiple of the «’th column to all the other 
columns.) We typically chose the x components at random from {—1,0,1} with a bias to- 
wards 0 (specifically, we used Pr[1] = Pr[—1] = 1/7). This was done so that the size of the 
numbers in the public basis will not grow too fast. 
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An important parameter in this process is the number of elementary matrices which we 
multiply together (which we refer to as the “number of mixing steps”). In our experiments 
we only used matrices of the first form, and went through the values of 2 in order so as to 
make sure that we hit them all. Our experiments indicate that using 2n such matrices is 
sufficient. 


e Another possible type of elementary matrices are upper- and lower-triangular matrices with 
+1 on the main diagonal. We did very few experiments with these matrices. In these ex- 
periments we chose the non-zero entries in L and U (which are lower- and upper-diagonal 
respectively) from {—1,0,1}. We found that we need to multiply at least 4 LU pairs to 
prevent LLL from recovering the original basis. 


Comparing the two methods, we found that for the same “level of security”, the second method 
required a basis B with larger entries. Thus we used the first method in the most of our experiments. 
One way to keep the entries of the public basis small (using either of the distributions above) is to 
LLL-reduce the mixed basis. This does not affect the security of the trapdoor function (since an 
attacker can do the same thing). However, when used in the encryption scheme which we suggest 
in Section 4, there may be some advantage in keeping the entries in B “not too small”. 


3.4 Bases representation. 


To make evaluating and inverting the function more efficient, we chose the following representation 
for the private and public bases. The public bases is represented by the integer matrix B whose 
columns are the basis-vectors, so that evaluating fz.,(v,e) = Bv +e can be done in quadratic 
time. To invert fg, efficiently, however, we do not store the private basis #& itself. Instead, we 
store the matrix R-' and the unimodular matrix T = B-'R. Then, to compute fpo(e) we set 
v=T[R-'e| and e=c-— By, both of which can be done in quadratic time. 

Representing B,T is easy since they are integral matrices, but R~! is not an integral matrix, so 
we need to consider how it should be represented. One possibility, of course, is to keep the exact 
values of all the entries in R~-'. After all, the entries of R~! are all rational, and the number of bits it 
take to write them down is at most polynomial in the number of bits of Rk. This approach, however, 
is rather expensive in terms of running time. Although the entries in R are small (typically, only 
2-3 bits) the determinant of R is much larger (about 200 bits if R is a 100x100 matrix) which 
means that we need to work with very large numbers in order to perform operations on R7'. A 
different approach is to only keep a few bits of each entry in R~'. This, of course, may introduces 
errors. If we only keep ¢ bits per entry then we get an error of at most 2~* in each entry. 

Clearly, this has no effect on the security of the system (since it only effects the operations 
done using the private basis), but it may increase the probability of inversion errors. Since we only 
perform linear operations on R7', it is rather straightforward to evaluate the effect of adding small 
errors to its entries. Denote the “error matrix” by = («,;). That is, ¢; is the difference between 
the value which is stored for (R~');; and the real value of that entry. Then we have |e;;| < 27° for 
all 7,7. When inverting the function, we apply the same procedure as above, but uses the matrix 
R' © R-' 4+ E instead of the matrix R7! itself. 

Recall that the value of the function is c = Bv +e, where v is an integer vector and e is the 
“error vector”. Thus the vector v’ computed by the inversion routine is 


v=T[Re) =Tl(R'4+ £)(Bvte)| =v+T[Rte+ E(Bvt+e)| 


where the last equality follows since R~'Bv is an integral vector so we can take it out of the 
rounding operation and then we have TR~'Bv = v. Therefore, we invert correctly if and only if 
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[R-'e + E(Bv + e)| = 0, which means that all the entries in R7'e + E( Bv + e) are less than a $ 
in absolute value. The size of the entries in the vector R~'e is analyzed in Section 3.2, so here we 
only consider the vector E( Bv + e). 

Recall that all the entries in F are less than 2~* in absolute value, and that the error vector 
e consists only of small entries (e.g., for our parameters, the entries in e are always less than 10). 
Thus the contribution of the vector Ke can be at most 10-27 in each coordinate, so we might as 
well ignore it. To evaluate the entries in F Bv, assume that we represent each entry in the matrix 
B using & bits, and each entry in the vector v using m bits. Then, each entry in the vector EF Bv 
must be smaller than n-2*+™-* in absolute value. 

For example, if we work in dimension 200, use 16 bits for each entry in B and 16 bits for each 
entry in v, and keep only the 64 most significant bits of each entry in R-! then the entries in LE Bv 
will be bounded by 200 - 2'°+1°-%* » 2-74, Thus, a sufficient condition for correct inversion is that 
each entry in R~'e is less than ; — 2-*4 in absolute value (as opposed to less than ; which we get 
when we store the exact values for R~'). Clearly, this has almost no effect on the probability of 
inversion errors. 


3.5 Security Analysis 


In this section we provide some analysis for the security of the suggested trapdoor function by 
considering several possible attacks and trying to analyze their work-load. We start with evaluating 
the work-load of a brute-force attack. 


3.5.1 Brute-Force Attack 


An obvious pre-processing step in every attack on this construction is to reduce the public basis 
B to get a better basis B’ which can then be used to invert the function. Notice that the only 
feature of R which we used when we analyzed the error-probability is that the rows of R7! have 
small Euclidean norm (in other words, R has a very small dual orthogonality defect). If we can 
find another basis with this property, then we can use it just as well. However, finding a basis with 
a very low dual orthogonality defect is assumed to be a hard problem. 

Thus, we assume that even after the lattice reduction, the attacker still have a basis B’ with a 
rather large dual orthogonality defect.? For the sake of simplicity, we assume that the basis used 
by the attacker is the public basis B. Trying to use the basis B for inverting the function in the 
same manner as we use the basis R means that given the ciphertext c = Bv +e, we compute 
B-te =v+ Bu 'e. Then we can do an exhaustive search for the vector d ¢ B-le. Below we 
give an approximate analysis for the size of the search space that the attacker needs to go through 
before it finds the correct vector d. 

Denote the i’th entry in d and e by 6; and ¢; respectively. The 7’th row of Bu! by b; and the 
(i,j)’th element in Bo! by (@,;. Using these notations we can write 6; = b; oe = >»; Bigg, and 
therefore 


E[6;] = 0 and VAR[6;] = ¥7 47, El] = (|i) 


where ||b;|| is the Euclidean norm of the i’th row of B!. 
To evaluate the size of this search space for d, we make the simplifying assumptions that 
each entry 6; in d is Gaussian, and that the entries are independent. Based on these simplifying 


°This will be the case, for example, if the public basis B is obtained by applying a “good lattice-reduction 
algorithm” to the basis which was obtained by mixing the private basis R. 
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assumptions, the size of the effective search space is exponential in the differential entropy of the 
Gaussian random vector d. Recall that the differential entropy of a Gaussian random variable « 
with variance o” is h(a) = $log(meo”). Since we assume that the 6;’s are independent, then the 
differential entropy of the vector d equals the sum of the differential entropies of the entries, so we 


get 
h(d) = 5) log(reo"||by|?) = % tog( eo?) + log |b 


so the size of the search space is 2") = (re)"/?- 0” -T], ||b;|| = (we)"/2-0” - orth-defect*(B)/det(B). 
Note that the term det(B) in the last expression depends only on the lattice and is independent of 
the actual basis B. 


Typical numeric values. In the experiments which we performed (in dimension 100 with o = 2) 
evaluating this last expression on the (LLL-reduced) public bases resulted in typical work-load of 
about 10° x 278°, 


3.5.2 Other Attacks 


In this section we discuss other possible lines of attack against the scheme. One rather obvious 
improvement on the brute force attack which is described above is to use a better approximation 
algorithm for the CVP. In particular, instead of using Babai’s “Round-off” algorithm we can use 
the “Nearest-plane” algorithm which was also described in [Ba86]. On a high-level, the difference 
between the Round-off and the Nearest-plane algorithms is that in the Nearest-plane, the rounding 
in the different entries are done adaptively (rather that all at once). 

One way to describe the Nearest-plane algorithm (which is somewhat different than the way it is 
described in [Ba86]) is as follows: Given the point ¢ and the (LLL reduced) basis B = {b,,---, by} 
(in the order induced by the LLL reduction). We compute the representation of ¢ as a linear 
combination of the b;’s, c = }°, 6;b;, but we only look at the last coefficient 6,. We then replace c 
with the point ec’ = c—[6,| b,, and replace b, with a vector b* which is orthogonal to all the other 
basis vectors. Denote the new basis by B’ = {b*,b,,---,b,_1. We then apply the same process to 
ce’ and B’ (this time looking at the coefficient of b,_;). We repeat this until we eliminate all the 
vectors from the original basis B. It is clear that if at each step we got the right coefficient then 
the vector which is left at the end is just the error vector e. 

As was pointed to us by Don Coppersmith, this attack can be improved in practice in several 
different ways: 


e Instead of picking the vectors by the order which was induced by LLL, we can pick them 
by the size of the Euclidean norm in the corresponding rows of B~'. As we showed in the 
analysis of the brute-force attack, this choice maximizes the probability that the coefficients 
obtained by rounding are really the right coefficients. 


e We can apply a lattice-reduction procedure to the remaining basis-vectors after each iteration 
(or once every few iterations). This improvement is particularly useful since the performance 
of the lattice-reduction algorithm improves rapidly as the dimension decreases. Also, we 
can round more than one coefficient at a time (if there are several vectors for which the 
corresponding rows of B~' have small norm). 


e If all the rows in B-! have a large Euclidean norm, we can apply an exhaustive search similar 
to our brute force attack to the few entries which has the smallest Euclidean norm. That is, 
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we try to continue the same algorithm for each plausible setting of these entries. Since we 
only a few entries (and we picked the ones with the smallest norm), we expect that the size 
of this exhaustive search will be rather small. 


To defeat this attack, we must make the dimension of the original lattice large enough so that all 
the rows in B! will have large Euclidean norm. Although we did not perform extensive tests of 
this attack, the data that we have so far indicates that when using LLL as our lattice-reduction 
algorithm, we need to do some exhaustive search even for dimension 120. It seems that when 
using LLL, this attack is infeasible for dimensions above 140. We still do not have data about the 
performance of this attack using better lattice-reduction algorithms. 


4 Encryption Scheme 


Our public-key Encryption scheme is based on our candidate one way trapdoor function in the usual 
way. That is, to encrypt a message we embed it inside the argument to the function, compute the 
function and the result is the ciphertext. To decrypt, we use the trapdoor information to invert 
the function and extract the message from the argument. 

Recall from Section 3 that, in high level, our one way trapdoor function takes a lattice vector 
and adds to it a small error vector. In the context of an encryption scheme, we say that we ‘encrypt 
a lattice vector’ by adding to it a small error vector, and the resulting vector in R” is the ciphertext. 
To encrypt arbitrary messages, we specify an (easily invertible) encoding which maps messages into 
lattice vectors which are then encrypted as above. Decrypting the ciphertext amounts to solving a 
particular type of CVP instances which was discussed in Section 3. In a nutshell, the Encryption 
scheme can be described as follows (using the algorithms GENERATE, SAMPLE, EVALUATE and 
INVERT from the description of our trapdoor function). 


Generating Keys. On security parameter 1", run algorithm GENERATE(1”) to get the triplet 
(B,R,o). We let the public key be (B,c) and the secret key to be (R7',T) where T = B'R. 


Encryption. On input message s and public key (B,c), we first apply some (randomized) en- 
coding function v — Ene(s) to encode s as a vector v € Z”. We note that this encoding is in 
fact the only component of the encryption scheme which is not directly implied by the trapdoor 
function construction. We discuss this encoding function in Section 4.2. (For now, we let Enc, Dec 
denote a pair of public and easy to compute functions such that Dec( Enc(s)) = s.) 

Once we computed v, we pick at random an “error-vector” e € R” according to the distribution 
induced by the SAMPLE algorithm from Section 3. We then apply the function fg, to v and e to 
get the ciphertext c — fg,(v,e) = Bv +e. 

The operations involved in encrypting a message are therefore: (1) Encoding it as a integer 
vector; (2) Choosing a random vector; and (3) Performing one matrix-vector multiplication and 
one vector addition. Thus we have an O(n?) algorithm for encryption (where n is the dimension 
we work in). 


Decryption. To decrypt ¢ we use the private key to invert the function fg, by setting v — 
T [Rte]. We then extract the message s from the vector v by setting s = Dec(v). Decrypting 
a message amounts to two matrix-vector multiplications and one rounding operation on a vector. 
Thus we also have an O(n”) algorithm for decryption. 
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Detecting decryption errors. One property of the above decryption procedure is that although 
there is a probability of error, it is still possible to verify when the message is decrypted correctly. 
This enables the legitimate user to identify decryption errors, so that it can take measures to 
correct them. Recall that we encrypt the lattice point p by adding to it a small error vector e, 
thus obtaining the ciphertext c = p+e. When we decrypt ¢ and find a lattice point p’ (which we 
hope is the same as p), we can verify that this is the right lattice point by checking that the error 
e’ = c — p’ is indeed small. For example, if we pick the error vector so that it never contains any 
entry larger than o, then we can check that ¢; < o in each component. Thus we get 


Fact 4.1: If the underlying lattice does not contain any non-zero vector with L,, < 20 then 
decryption errors can always be detected. 


Plaintext Awareness. It seems that our scheme enjoys some weak notion of “plaintext aware- 
ness” in that there is no obvious way to generate from scratch a valid ciphertext (i.e., one which 
the decryption algorithm can decrypt) without knowing the corresponding lattice point. Still this 
plaintext awareness is limited, since after seeing one valid ciphertext c, it is possible to generate 
other valid ciphertexts without knowing the corresponding lattice-points (simply by adding any 
lattice point to c). 


4.1 Partial Information Attacks 


In addition to the attacks on the underlying trapdoor function which were outlined in Section 3.5, 
there are types of attacks which only make sense in the context of encryption scheme. Namely, 
rather than trying to recover the original message itself, the attacker can instead try to extract 
from the ciphertext some partial information about the message (e.g., the value of a specific bit in 
it). On way in which such partial information attacks can be mounted against this scheme is as 
follows: 

Recall that the ciphertext is computed as c = Bv + e and therefore B~'e = v +d, where d is 
defined d= Boe. Thus, the 7’th entry of (Bu'c) is equal to y; + 6; (4%, 6; are the 7’th entries in 
v,d respectively). We saw in Section 3.2 that if the Euclidean norm of the row b; in Bo! is small, 
then the variance of 6; will also be small (notice that the dual-orthogonality-defect of B may still 
be large because of other rows in B~! that have much larger Euclidean norm). In particular, if 
o + ||b;|| < 1 then there is a reasonable probability that |6;| < 1/2, in which case v; is just the ith 
entry of the rounded vector [B~'c]. 

Thus, an attacker could just focus on the rows of B~' which have low Euclidean norm, and try 
to compute the corresponding entries in v. Knowing some of the entries in v may - in turn - give 
some partial information about the message s. More generally, the adversary may view the 7’th 
entry of B-'e as an estimate for v; (which is probably accurate up to o||b,||), and use this partial 
knowledge about the entries in v to obtain some partial knowledge about s. 

Somewhat surprisingly, for the purpose of this attack - reducing the basis B does not seem 
to help (of course, as long as the resulting basis is not “reduced enough” to break the underlying 
trapdoor function). To see why, consider the unimodular transformation T’ between the original 
basis B and the reduced basis B’ (T’ = (B’)~'B). Since ¢ is computed using the original matrix 
B, then when trying to extract partial information using B’ we compute 


vi = (B)\'e = (B’) | (Bv +e) =(B') 'Bv + (B') 'e=T'v4+ (Bye 


If (B’)~! has rows with small Euclidean norm, then the attacker may be able to learn the corre- 
sponding entries in 7’v, but this still does not seem to yield an estimate about any entry in v. It 
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follows that in this encryption scheme, it may be useful to publish public basis which is not LLL 
reduced. 

In any case, foiling the partial information attacks requires a careful design of the encoding 
scheme v — Enc(s), so that partial information that can be revealed about v will not yield partial 
information about s. This is discussed next. 


4.2 Encoding messages as vectors in 2” 


In this section we discuss ways to encode messages as vectors in 2”. As we mentioned above, we 
would like to have an encoding scheme such that knowing a few entries in v exactly and knowing 
some rough estimate on all the other entries still yields “almost no information” about s. 

In choosing an encoding function, there are other parameters (besides security) which need to 
be considered. Perhaps the most important of them is to obtain high bandwidth: Since for every 
encryption operation we end up sending the vector c = Bv + e, we would like to use as much of 
this bandwidth as possible for message bits. 


The Trapdoor Function Paradigm: Using hard bits. The first approach is a generic one. 
Since we have a candidate for a trapdoor one-way function, we may use hard-core bits of this 
function as the message bits. In particular, we can use the general construction of Goldreich-Levin, 
[GL84]) which shows how and where to hide hard core bits in a pre-image of any one-way function. 
(This construction enables hiding log n bits in one function evaluation.) 

This approach has the advantage of being able to prove that it is impossible to even distinguish in 
polynomial time between any two messages, under the assumption that we started with a trapdoor 
function. The major drawback is in terms bandwidth, since we can only send logn bits at a time 
for one function evaluation. Moreover, since this approach is generic, it doesn’t provide us with 
any insight which we may exploit to increase the bandwidth. 


Using the low-order bits in v. Another approach is to embed the bits of s directly in the 
vector v. Since an attacker can get an estimate for the entries in v, then it is clear that we need 
to embed s in the least significant bits of these entries. Also, the fact that the attacker may be 
able to learn exactly some of the entries in v implies that we should not put any bits of s in those 
entries. Note that we know in advance which are the “weak entries”, since these correspond to the 
rows in B-! with small Euclidean norm. 

We start by examining the simple case in which we only use the least-significant-bit of each 
entry (except for the “weak entries”). and pick all the other bits at random. Then, given an 
estimate v; = v; + 6; for the entry v;, the attacker should decide whether the number in that entry 
was even or odd (that is, whether the message bit is a 0 or 1). 

If we assume that each entry in ¥; can be approximated by a Gaussian random variable with 
mean v; and variance o?||b,||? (which is reasonable since i; is a sum of n independent random 
variable which are all “more or less the same”, then given the experimental value v;, the statistical 
advantage |Pr[v; is even | %] — Pr[v; is odd | %]| is exponentially small in o||b;||. Thus, if the 
Euclidean norm of b; is large enough, then the attacker, who knows p;, gets only a small statistical 
advantage in guessing the corresponding bit of s. If we have a row of B~! with very high Euclidean 
norm, then we may be able to use the corresponding entry of v for £ message-bits. It can be 
shown that the statistical advantage in guessing any of these bits is at most exponentially small in 
o||b;||/2¢. If the Euclidean norm of each individual row in B~! is too small, we can represent each 
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bit of s using several entries by making that bit the XOR of the least significant bit in all those 
entries. 


Reducing mod 2 Notice that using only the least significant bits for the bits of s is really a 
linear operation, since we can write v = s+ 2r, where s is the {0,1} vector with the message bits 
and r is a random integer vector. Therefore, when using this encoding we should consider attack in 
which all the matrices involved are reduced modulo 2, and the attacker tries to compute the vector 
v mod 2. 

Namely, we have c = Bv +e = Bs + 2Br+e, so when reduce the last equation mod 2 we get 
c= Bs+e (mod 2). We can now compute the inverse of B mod 2 over Z,. If such an inverse 
exists then denote it by By'. In this case we get By'ec = s+ By'e (mod 2). Notice, however, 
that e mod 2 is a random binary vector which is 1 with probability 07, and so - for each entry 6; of 
d = By'e mod 2 - the statistical difference | Pr[é; = 0] — Pr[é; = 1]| is exponentially small with n. 


5 Signature scheme 


In this section we describe a slight modification of our trapdoor function which is more suitable for 
a signature scheme, and provide an initial assessment of its properties. In this signature scheme, 
just like in the encryption scheme, the user uses its private basis B to find lattice points which are 
close to some given vectors in R”. In this scheme, we “sign a vector in #”” by providing a lattice 
point which is “rather close” to that vector. The public key for the signature scheme contains a 
public basis B for the lattice, and a threshold + > 0 which defines how close should the lattice 
point be to the given vector. The choice of 7 is discussed in Section 5.3.1. 


5.1 Operation 


The key-generation procedure amounts to the generation of two bases (as in the GENERATE pro- 
cedure of the trapdoor function) and to the determination of the threshold r. 


Signature. To sign a message s, we first need to interpret s as a vector in R”. For this we use 
some encoding function to get u — Enc(s) (see Section 5.3.2). Then, using the private key (R7',T) 
we apply the exact same procedure as for decrypting a message, namely compute v — T [R7'ul. 
The vector v is the signature on s. The signing time is O(n”) just like for encryption, provided 
that the encoding time is so bounded. 

Notice that v is an integral vector, which we view as a representation of the lattice point 
p = Bv. The reason that we expect p to be “rather close” to u is that the representation of u 
as a linear combination of the columns of R is R~'u, while the representation of p as a linear 
combination of the columns of R is [R7'u]|, and these two representations differ by at most a half 
in each coordinate. We discuss this further in Section 5.3.1. 


Verification. To verify a signature v on message s w.r.t. the public key (B,7), we compute the 
vectors u — Enc(s), p — Bv, and check that the Euclidean distance between them is less than 
T. Namely, the verification process consists of checking the inequality ||/ne(s) — Bv|| < 7. This 
process too takes time O(n*), provided again that the encoding time is so bounded. 


18 


5.2 On the analog nature of the scheme 


We note that because of its “analog nature”, our scheme has some properties which are very different 
than those of other known signature schemes. In particular, notice that if u,u’ are two vectors in 
R” which are very close to each other (much closer than the threshold 7), then it is very likely 
that a signature on u will also be a signature on u’. This “metric preserving” property suggests 
different signing procedures for digital versus analog data. 

If we are signing digital data then we should make sure that a signature on one message could 
not be used to obtain a signature on another message. This can be achieved by the use of a “good 
hash function” to hash the message before we interpret it as a vector in R” (or, alternatively, as 
the means to map messages to such vectors). If indeed the hash function is good enough, it will 
ensure that even if two messages to be signed are close to each other at the outset, they will be 
far apart after being hashed and thus be mapped to different signatures. Note that the hashing 
and signing paradigm is what is necessary and in fact done in practice when using the RSA and 
DSA signature schemes. The reason is to ensure the difficulty of forging the signature of messages 
related to those messages signed previously by a legitimate user — a forgery which is otherwise easy 
in both the case of “bare” RSA and DSA. 

On the other hand, the “metric preserving” property may useful when signing analog data 
such as music, speech, images etc. In employing a traditional digital signature scheme to such data, 
the natural procedure is to first sample the data so as to obtain digital representation of it, and 
next to apply the signature scheme to this digital representation. This procedure has the disad- 
vantage of potentially mapping close analog signals to different (yet close) digital representations. 
In particular, minor changes in the either the sampling process or in the analog signal itself, may 
result in a different digital representation. Consequently, the signature may not be valid when the 
analog signal changes a little. Thus, a method such as ours, where the analog signal may be signed 
directly have an advantage of supplying signatures which remain valid (or at least meaningful) 
under small changes of the analog signal. 

Note that the above discussion depends on the encoding of data as vectors in R”. Each of the 
two settings calls for a different type of encoding. In the “digital” setting we wish the encoding 
to scramble messages so to destroy any structure (e.g., related messages should yield unrelated 
encoding). In the “analog” setting we want the encoding to preserve the metric of the data space 
(e.g., close analog signals should yield close encoding in R”). For further details see Section 5.3.2. 


5.38 Various choices 


In addition to the choices made for the process of selecting the private and public bases (discussed in 
Section 3), there are two important choices to be made: Firstly, we need to determine the threshold 
parameter 7, and secondly we need to determine the method of encoding data as vectors in R”. 


5.3.1 Choosing the threshold 


In this section we show how the threshold 7 should be chosen so that the signature algorithm is 
successful with high probability, and in Section 5.4 we examine the effects of the choise of the 
security of the signature scheme. 

In the analysis below we use the following notations. Let A be a basis for some lattice in 
R”". We denote by RounD,y(u) the lattice point which is generated from u by considering the 
representation of u as a linear combination of the vectors in B and rounding the coefficients to the 
nearest integers. That is, RoOUND4(u) = A[A7!uj. 
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Consider now a random vector u € R” and we try to evaluate the distance between u and 
RoOUNDa(u), where F is the private basis. Recall that conceptually, the lattice point ROUNDR(u) 
is the signature on the vector u (though the actual signature is the representation of that point 
w.r.t. the public basis B). Define the “error vector” 


e=[Ru]- Rou 
that is, the 7’th entry ¢; in e is the difference between the 7’th coefficient in the representation of u 
as a linear combination of the vectors in R and the nearest integer. Then the distance between u 
and ROUND,(w) is just the Euclidean norm of the vector Re. Clearly, we have |e;| < 1/2 for all ¢, 
since this is just the difference between some real number and the nearest integer. This immediately 
gives us 


Claim 5.1: Let R be the private basis used for signing and denote the maximum L, norm of any 
row in R by y. If we set 7 = 7,/n/2 then the signing algorithm always succeeds (with probability 


1). 


Proof: Denote d = Re. We can write the i’th entry in d as 6; = )°; pije; where pj; is the t, 7’th 
entry in R and ¢; is the j’th entry in e. Therefore |6;| < >>; |pie;| < 3D; pig] < $7 and so 


) = 7Vn/2 


O 


As for the trapdoor function, we may choose to set a lower value for 7 to get a better security. 
We now describe an “approximate analysis” which enables us to estimate the failure probability 
for lower values of 7. 

(As opposed to the situation with the trapdoor function, however, even these approximate 
estimates are not very good. Experimentally, we found that we can set the value of 7 to be about 
half the value which we get from the analysis below.) 

Recall that the distance between u and ROUND,(u) is the Euclidean norm of the vector Re 
where |e;| < 1/2 for all 7. Moreover, if u is chosen uniformly at random from a large enough box 
in R” then the distribution induced over the vector e is close to the uniform distribution over 
(—3, +9)". 

To see that, notice that if we choose u uniformly from the parallelepiped {3°; y,r; : 0 < y< 1} 
(where r; is the 2’th column of R) then the induced distribution on e is exactly the uniform 
distribution. Moreover, every large enough box in R” can be viewed as union of many disjoint 
parallelepipeds like that, plus some “left over” volume. As the volume of the box increases, the 
fraction of this “left over” volume decreases. Thus, the induced distribution of the vector e gets 
closer to the uniform distribution. 

Thus, to evaluate the distance between u and ROUND,(u) when u is uniform in some large box 
in R", we need to evaluate the Euclidean norm of the vector Re when e is uniform in (-$, +3)”. 
We can write each entry in this vector as 6; = 7, pije; where pj; is the 2, 7’th entry in R and ¢; is 
the 7’th entry in e. 

Denote the largest entry in R by pmax, then each of the random variables p,;¢; is distributed 
in the interval [-==, +=] and has zero mean. Using Hoeffding bound we conclude that for any 
6>0 


2 
max 


26? 
Pr[|é,;| > én] < 2-exp (- *) 


20 


which implies that 


_ 26? 
Pr [||d||? > 6?n?] < Pr [Ai s.t. 67 > 6?n7] < 2n-exp (- 5 *) 


max 


Therefore, to make the error probability less than ¢ it is sufficient to set 6 > (Pmax -In(2n/e)/2n), 
which means that the threshold is set to 


+ = V@n3 > a In(2n/e)Vn (2) 


Typical numeric values. The value of the threshold which we obtain from Equation 2 with 
€ = 27-*° and n = 140 is 
r= Pre In(280 - 23°)\/TA0 & 156 pmax 


In our experiments we typically have pmax = 4, which implies that 7 ~ 625. If we are willing to 
settle for ¢ = 10~* the we can make 7 & 350. As we said above, we found that experimentally 
we can actually use threshold value which is about half of what we get from these bounds. In 
particular, for the setting above we can set t = 200 to get the error probability below 107*. 


5.3.2 Encoding messages as vectors in ?” 


Recall that in the above scheme, a lattice-point p is considered a valid signature on a vector v if 
the two vectors are “close enough”. This means that the same lattice point p is valid with respect 
to many different vectors (in fact, all the vectors in a sphere of radius 7 centered at p). This fact 
has two implications: On one hand, we can allow many “slightly different” representations of the 
same “logical datum” without effecting the validity of the signature. On the over hand, vectors 
which represent different “logical datum” must be very different from one another. 


Signing analog data. As a simple example of an analog data, consider attaching a digital 
signature toa FAX document (say, by printing a bar-code containing the signature on the document 
itself). Clearly, in this case we cannot expect that the senders digital representation of the document 
will be identical to the representation obtained by the receiver after the document is printed. 
However, suppose that we could represent the “contents” of the FAXed document using some small 
set of parameters, in such a way that 


e Printing and re-scanning the document does not change its parameters very much; and 


e Documents which contains different contents are represented by very different sets of param- 
eters. 


If we have such representation, we could use these sets of parameters to represent a document as 
a vector in #2”. Consequently, it will be very likely that a digital signature on some representation 
of the document will still be valid even after the document was printed and re-scanned. We will 
need to assume that such a representation will be sufficiently rich in the sense that documents of 
interest will results in representations in a sufficiently large box of R”. (Clearly, signatures are 
easy to forge if documents of interest are all mapped to a small region of R” — and carrying the 
argument to an extreme, we definitely do not want all documents to be mapped to within distance 
T of the same lattice point.) Furthermore, it should be infeasible to obtain a meaningful document 
which matches a random vector in this large box of R”. 
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Signing digital data. When signing digital data, we do not have the “multiple representation” 
problem as above — there is a unique binary string which represents the logical datum. What we 
need is an encoding of binary strings as “random” points in R”. We may assume, without loss 
of generality, that the string has length n, since shorter and longer strings can be handled using 
well-known methods (such as padding and collision-free hashing, respectively). So what we need is 
a mapping of {0,1}”" to R” which does not map two different strings too close to one another (i.e., 
to within proximity 7). This is very easy to do. However, we want the range of this mapping to be 
sufficient “random” so that finding a close lattice point will be hard for these mapping-images. 


5.4 Security of the Signature Scheme 


To get some initial indication for the security of this scheme, we consider what happens when 
we try to execute the signing algorithm using the public basis B. Here we do not even have an 
approximate analysis. Instead we conducted experiments to evaluate how close to the threshold 
we can get when using the public basis for signing. For the same setting as the “typical numeric 
values” in Section 5.3.1, (n = 140, max-entry in R = 4), we got distances which were all above 520 
(we tried 5 different LLL-reduced bases, 10000 “signatures” for each basis). This suggests that for 
these parameters, picking the threshold at 7 = 200 may be good enough to counter this attack, at 
least when using LLL as our lattice-reduction algorithm. 


6 Experimental Results 


We performed several experiments in order to measure the effect of various parameters in the basis 
generation process on the security of our scheme. Since, as we described in Section 3.5, the security 
of the scheme is related to the dual-orthogonality-defect of the bases involved, we view the ratio 
between the dual-orthogonality-defect of the public and private bases as our “measure of security”. 


Testing methods. For our experiments we used an implementation of the LLL lattice reduction 
algorithm due to the LiDIA group [Li95]. In each experiment, we chose a private and public bases- 
pair and evaluated the ratio between their dual-orthogonality-defects. We generated the public 
basis from the private one by mixing it (as described in Section 3.3) and LLL-reducing the result. 
To gain some confidence in our results, we repeated this experiment several times for each setting 
of the parameters. 


e For each private basis, we generated five public bases and used the ratio between the minimum 
dual-orthogonality-defect of these public bases and the dual-orthogonality-defect of the private 
basis as the “security-level” of this private basis. 


e For each setting of the parameters, we generated seven private bases with these setting and 
considered the median “security-level” of these seven bases. 


Parameters. The parameters which we tested are 


1. The dimension of the lattice, denoted by n. We performed most of the tests in dimensions 
80-120. 


2. The range of integers ({—/,---,+/}) from which we choose the entries in the private basis. 
Below we refer to this range as the ‘l-parameter’ of the private basis. 
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3. How “cube-like” is the private basis. Namely, we generated the private basis as R = k-I + 
rand(+l) for several values of k. (Where J is the identity matrix and rand(+l) is a random 
matrix with entries in {—/,---,+/}.) Below we refer to this parameter as the ‘k-parameter’ 
of the private basis. 


4. How many “mixing steps” are used to generate the public basis from the private one. 


6.1 Generating the Private Basis 


We first measured the effects of the parameters involved in choosing the private basis, namely 
lattice dimension (n), range of integers (1) and “cube-likeness” (&). For each setting of k,/, we 
tested dimensions 80 through 120 (in increments of 10). 


Entry size (/). We tested the /-parameter settings of 1, 4 and 10, working with both “random 
lattices” (k = 0) and “cube-like lattices” (k = 1[/1+/n]). The results of these experiments are 
summarized in Figure 1. In all these experiments, we applied 2n “elementary mixing mixing-steps” 
to the private bases and LLL-reduced the result to obtain the public basis (See Section ??. As can 
be seen from these results, the /-parameter had no effect on the “security-level” of the bases which 
we obtained. 


“Cube-like” parameter (/). The settings of the k-parameter which we tested are k = 0,k = 
sl [1+ Jn] and k = /[1+4+./n]. The reason that we express k in “units” of l/r is that the 
expected length of a random vector in {—/---+/}”" is O(l./n). We tested these settings with J = 1 
and / = 4. The results are summarized in Figure 2. Varying the value of & had the following effects 


e Increasing the value of & increases the dimensions in which LLL can recover the private basis. 
For example, LLL could recover the private basis in dimension 80 when we set k = 1 [1+ /n], 
but failed for the smaller values of k. 


e When the dimensions increase beyond some threshold, the ratio of the dual-orthogonality- 
defect becomes much larger for large values of k. The reason is that the dual-orthogonality- 
defect of the private basis becomes smaller (since the private basis is more “cube-like”). In 
fact, for k = 1 [14+ /n], the dual-orthogonality-defect of the private basis is already very 
close to one. On the other hand, the dual-orthogonality-defect of the corresponding public 
basis is not affected by this change (since beyond some threshold dimension, LLL fails to 
take advantage of the “cube-likeness” of the lattice). Thus, the ratio between the dual- 
orthogonality-defect of the public and private basis increases considerably. 


6.2 How Many Mixing Steps 


We also tested the number of “elementary mixing steps” which we apply to the private basis in 
order to get the public basis. In each elementary mixing step, we pick one of the basis vectors and 
add to it a random integral linear combination of the other vectors. In our experiments we chose 
the coefficients of this linear combination from {—1,0, 1} with Pr[1] = Pr[—1] = 1/7. To make sure 
that we replace all the vectors in the private basis, we must make at least n mixing steps. To make 
sure that we hit them all, we chose a random permutation over {1,---n} and picked the vectors 
according to the order in that permutation. 

To evaluate how “secure” is the resulting public basis, we LLL-reduced it and compared the 
dual-orthogonality-defect of the result with that of the private basis. In our experiments we tried 
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Figure 1: The effect of varying the entry size / for k = 0 (upper figure) and k = [1+ /n] -/ (lower 
figure). 
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R = k*l + rand(+-1). Applying 2*n mixing steps to get B 
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Figure 2: The effect of varying the parameter k for / = 1 (upper figure) and / = 4 (lower figure). 
We tessted the values k = 0,k = sl jl+J/n] andk=l/14+ Vn}. 
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R = k*l + rand(+-4). Applying n mixing steps to get B 
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Figure 3: Making only n mixing-steps. Notice that for a cube-like lattice, we were able to recover 
the private basis in all the dimensions. 
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to make n and 2n mixing steps before the LLL-reduction. The results for 2n mixing steps (with 
various parameters of the private basis) are presented in Figures 1 and 2. The results we get when 
we only make n mixing steps (for / = 4 and k = 0,k =1/1+4 \/n]) are summarized in Figure 3. 

It can be seen that when making only n mixing steps on a cube-like lattice, LLL was always 
able to recover the private basis. Another problem with making so few mixing steps (which is not 
reflected in Figure 3) is that the variance which we get for each setting of the parameters is much 
larger than what we get for 2n mixing steps. In fact, although the median results for k = 0 seem 
to increase exponentially with the dimension, the minimum results are very close to one even in 
dimension 120. 
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